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1.  Introduction 

During  this  grant  we  have  focused  on  four  tasks  related  to  the  POSTGRES  project,  namely: 

1)  refinement  and  implementation  of  the  POSTGRES  rules  system 

2)  integration  of  tertiary  memory  support  into  POSTGRES 

3)  efficient  support  for  very  large  arrays 

4)  efficient  support  for  expensive  predicates 

In  the  rest  of  this  report,  we  discuss  each  topic  in  turn. 


2.  Rules  System 

Near  the  end  of  the  previous  ARO  grant,  we  developed  a  collection  of  algorithms  for  supporting  vir¬ 
tual  classes  in  POSTGRES  as  well  as  alternate  versions  of  classes.  This  work  was  published  in  [STON90]. 
Briefly,  we  discovered  that  both  functions  could  be  implemented  by  innovative  uses  of  our  POSTGRES 
rules  system.  As  such,  the  special  purpose,  low-level,  code  required  in  traditional  systems  to  support  these 
constructs  can  be  replaced  by  a  small  collection  of  rules  written  in  the  POSTGRES  rule  language. 

To  validate  the  utility  of  this  idea,  we  have  implemented  the  rules  for  both  systems  in  POSTGRES 
and  examined  their  performance.  Specifically,  we  have  found  that  performing  version  management  using 
the  POSTGRES  rules  approach  is  competitive  with  utilizing  previous  low-level  code  we  had  written  in  the 
mid  1980’s.  Moreover,  it  is  much  easier  to  implement  and  modify  a  rules-based  system  than  one  based  on 
hard  code  in  a  3rd  generation  programming  language. 

In  addition,  our  current  implementations  for  rules  provide  "immediate  activation",  i.e.  the  action  for 
each  rule  is  triggered  at  the  time  the  event  specified  in  the  rule  becomes  true.  At  times,  a  user  would  like 
"deferred  execution",  i.e.  he  would  like  rule  activation  to  be  delayed  until  the  commit  time  of  an  enclosing 
transaction.  We  have  investigated  how  to  perform  deferred  execution  without  having  to  maintain  complex 
bookkeeping  about  the  effects  of  a  transaction  during  execution.  Unfortunately,  there  does  not  seem  to  be 
an  easy  way  to  implement  this  functionality  without  extensive  reworking  of  the  current  code  base.  A  paper 
on  the  options  and  problems  in  this  area  along  with  some  suggestions  for  future  investigation  appeared  in 
the  TF-EE  Transactions  on  Knowledge  and  Data  Engineering  [STON92]. 

Lastly,  we  have  worked  on  applying  DBMS  rules  systems  to  a  substantial  real  world  problem  to  vali¬ 
date  the  concepts.  In  particular,  we  have  implemented  the  notion  of  calendars  for  a  financial  services  time 
series  application  using  the  POSTGRES  rules  system,  A  report  on  this  matter  appeared  in  the  1994  IEEE 
Data  Engineering  conference  [CHAN94]. 


3.  Tertiary  Memory  Support 

We  have  worked  on  two  separate  problems  in  this  area,  namely  the  efficient  integration  of  tertiary 
memory  support  in  a  DBMS  query  optimizer  and  the  provision  of  a  file  system  interface  on  top  of  a 
DBMS.  We  discuss  each  topic  in  turn. 

First,  we  have  performed  a  detailed  study  of  query  optimization  in  a  tertiary  memory  context 
Specifically,  we  have  examined  query  processing  algorithms  implemented  by  DBMSs  to  perform  restric¬ 
tions,  projections  and  joins  and  have  discovered  versions  of  these  algorithms  optimized  for  tertiary 
memory.  Moreover,  we  have  investigated  the  optimal  scheduling  of  the  robot  arm  between  tertiary 
memory  data  and  a  CPU.  We  have  found  that  substantial  performance  improvements  are  available  through 
careful  scheduling  of  the  "batch"  of  requests  that  are  outstanding  in  a  multi-user  environment.  A  report  on 
this  matter  has  been  accepted  at  the  1995  Very  Large  Data  Base  Conference  [SARA95]. 

Second,  we  have  proposed  that  a  standard  operating  system  file  system  interface  be  simulated  on  top 
of  a  DBMS-managed  storage  hierarchy.  In  current  systems,  the  DBMS  must  exist  on  top  of  the  file  system, 
and  serious  function  and  performance  consequences  result  This  has  led  many  commercial  DBMSs  to 
bypass  the  file  system  and  implement  their  systems  directly  on  top  of  a  "raw"  disk.  In  contrast,  it  is  possi¬ 
ble  to  reverse  the  two  systems,  and  implement  a  file  system  on  top  of  a  DBMS,  a  concept  which  we  called 
"Inversion". 

Using  the  Inversion  concept,  any  file  system  operation  would  turn  into  a  queiy  to  the  DBMS.  If  a 
user  performs  very  small  reads  and  writes,  then  the  performance  of  this  approach  may  be  problematic. 
However,  if  a  user  is  reading  or  writing  very  large  objects,  often  to  tertiary  memory,  then  there  should  be 
little,  if  any,  performance  difference  between  the  two  ^proaches. 

We  have  implemented  a  prototype  version  of  this  Inversion  concept,  and  a  report  on  this  topic 
appeared  in  the  1993  TBEE  Data  Engineering  Conference  [STON93].  It  demonstrates  that  very  reasonable 
performance  is  available  as  long  as  reads  and  writes  involve  large  objects. 


4.  Storage  of  Multidimensional  Arrays 

Next,  we  have  investigated  the  layout  of  very  large  multidimensional  arrays  on  secondary  and  terti¬ 
ary  memory.  Specifically,  in  the  companion  Sequoia  2000  project,  we  have  used  POSTGRES  to  support 
the  DBMS  needs  of  a  collection  of  atmospheric  science  users  of  General  Circulation  Models  (GCl^). 
They  wish  to  store  the  output  of  their  models,  which  is  in  the  form  of  a  four  dimension  array,  in  a  data 
base.  Moreover,  subsequently,  they  wish  to  form  various  projections  of  this  array  data,  an  operation  often 
called  "creating  a  hyperslab". 

We  have  discovered  for  typical  hyperslab  workloads  that  storing  arrays  in  "Fortran  order"  is  very 
inefficient.  Furthermore,  "chunking"  the  array  shows  a  marked  speedup.  A  paper  on  this  topic  appeared  in 
the  1994  IEEE  Data  Engineering  Conference  [SARA94] 


5.  Optiimzation  of  Expensive  Functions 

We  have  extended  the  POSTGRES  optimizer  to  deal  with  functions  which  are  expensive  to  compute. 
For  example,  consider  the  following  query: 


retrieve  (EMP.name) 

where  beard  (EMP.picture)  =  "red"  and  EMP.age  <  30 


In  this  case,  the  first  clause  in  the  predicate  consumes  perhaps  100  million  CPU  instructions  to  perform  a _ 

pattern  analysis  of  the  image  to  determine  if  the  picture  is  of  a  person  with  a  beard.  Moreover,  the  function  Fop 
must  read  a  megabyte  or  more  of  data  in  the  process.  In  contrast,  the  second  clause  requires  perhaps  100 
instructions  and  the  reading  of  four  bytes.  When  there  are  dramatic  differences  between  the  computational 
demands  of  the  various  clauses  in  the  predicate,  it  is  crucial  for  an  optimizer  to  be  "smart"  about  the  pro-  ^ 
cessing  order  of  the  clauses.  In  addition,  the  optimizer  should  consider  delaying  the  processing  of  clauses 
involving  expensive  functions  as  long  as  possible  when  constructing  the  query  plan.  " _ 

We  demonstrated  sketchy  results  in  ISTON91]  of  an  approach  to  this  problem.  A  more  extensive 
analysis  of  the  topic  appeared  in  the  1993  ACM-SIGMOD  annual  conference  [HELL93].  At  the  current  ; 


□ 
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time,  these  algorithms  have  been  fully  implemented  in  the  POSTGRES  DBMS,  as  well  as  in  the  commer¬ 
cial  version  of  the  code  line,  a  product  called  Illustra. 
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